Skip to content

Conversation

JenniferWang
Copy link
Contributor

The current naming is confusing. Here's the definition in TorchTitan
https://github.com/pytorch/torchtitan/blob/aa000a3c42e8bb37e51f26eb3e3e024b37ccc479/torchtitan/config/job_config.py#L196

    local_batch_size: int = 8
    """Local batch size (i.e., per-device batch size)"""

    global_batch_size: int = -1
    """
    Global batch size (defaults to `training.local_batch_size * data-parallel degree`)
    """

Global batch size is the knob for enabling gradient accumulation
https://github.com/pytorch/torchtitan/blob/6a3a9da9564d82a1120c7639ef6236bb4cffa049/README.md?plain=1#L70C72-L70C89

https://github.com/pytorch/torchtitan/blob/6a3a9da9564d82a1120c7639ef6236bb4cffa049/torchtitan/experiments/forge/engine.py#L159

        # calculate gradient accumulation steps
        self.gradient_accumulation_steps = global_batch_size // (
            job_config.training.local_batch_size * dp_degree
        )
        assert self.gradient_accumulation_steps > 0
        self.loss_fn = rescale_accumulated_loss(
            self.loss_fn, self.gradient_accumulation_steps
        )

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 10, 2025
@JenniferWang JenniferWang marked this pull request as ready for review October 10, 2025 13:32
@allenwang28
Copy link
Contributor

this is a step in the right direction, but one nit/confusion is that the local_batch_size is the trainer's local batch size

But trainer_device_batch_size sounds verbose and somehow still wrong lol

@JenniferWang
Copy link
Contributor Author

this is a step in the right direction, but one nit/confusion is that the local_batch_size is the trainer's local batch size

But trainer_device_batch_size sounds verbose and somehow still wrong lol

I think for now, it's less confusing to follow the naming from TorchTitan. Let's decide on better naming in the future.

@JenniferWang JenniferWang merged commit 06b972a into main Oct 10, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants